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Abstract. Algorithms play an increasingly important role in public pol- 
icy decision-making. Despite this consequential role, little effort has been 
made to evaluate the extent to which people trust algorithms in decision- 
making, much less the personality characteristics associated with higher 
levels of trust. Such evaluations inform the widespread adoption and effi- 
cacy of algorithms in public policy decision-making. We explore the role 
of major personality inventories — need for cognition, need to evaluate, 
the “Big 5” — in shaping an individual’s trust in public policy algorithms, 
specifically dealing with criminal justice sentencing. To explore person- 
ality in this context, we fielded an original survey experiment aimed at 
assessing the impact of varying advice sources on forecasting criminal re- 
cidivism, conditioned by personality traits. We found strong correlations 
between all personality types and general levels of trust in automation, 
as expected. Further, we uncovered evidence that need for cognition in- 
creases the weight given to advice from an algorithm relative to humans, 
and “agreeableness” decreases the distance between respondents’ expec- 
tations and advice from a judge, relative to advice from a crowd. 
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1 Introduction 


Algorithms are increasingly important in public policy implementation 
2022). Algorithms assist officials in major US cities to al- 
locate resources (O’Brien} 2015), judges in detecting gerrymandering 
(2017), and the military to control weapons (2018). Recently, 


algorithms have also begun to play a role in criminal sentencing, where algo- 
rithms are used by judges to inform expectations on a defendant’s probability 


of recidivating (Waggoner & Macmillen| |2021). Such a hybrid-decision making 


process between humans and algorithms influences the parameters, duration, 


and severity of sentencing (Dressel & Farid} |2018). 
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Despite the rise in interest about automation and algorithms, little atten- 
tion has been paid in public policy to algorithms or the psychological factors 
that influence trust in them. explored situations under which 
people approve development of autonomous weapons systems, but this reveals 
little about the underlying trust people have in algorithms in practice. Further, 
some have debated whether individuals place low levels of trust in algorithms, 
“algorithm aversion” (Dietvorst, Simmons, & Massey (2015), or high levels of 
trust, “algorithm bias” (Logg) |2016). Still, very little attention has been paid to 
how individuals’ psychological characteristics might influence attitudes towards 
algorithms. Instead, the literature tends to focus on demographic or cultural 

To address this gap, we recently fielded a criminal sentencing survey experi- 
ment and leveraged three major inventories of psychological measures of person- 
ality to explore who is more or less trusting of algorithms: “need for cognition” 


(NC) (Cacioppo & Petty] [1982), “need to evaluate” (NE) (Bizer et al.| [2004), 
and the “Big 5” (Norman||1963). 


The survey experiment was primarily interested in assessing the impact of 
varying advice sources (judge, algorithm, a ”crowd” of peers) on respondents’ 
forecasts of criminal recidivism. Of primary interest was the conditioning role 
of personality in this forecasting effort. The details of and findings from the 
experiment are detailed throughout the remainder of the paper. 


1.1 Personality Inventories 


The first inventory, need for cognition ( see pa , is ae Toes individuals who 
have a strong desire to learn and grow (Cacioppo & Petty! . Some previous 
studies on NC in similar contexts have cacr that ae aah NC individuals 
are asked to undertake a task in which they are given little information and 
then provided expert advice, they are more likely to assign greater weight to 
that advice, rather than relying on heuristics 
(2004). This suggests that high NC individuals will be more likely to take 
advice insofar as they view that advice as “expert,” given their more elaborate 


processing of information (Sicilia, Ruiz, & Munuera| |2005). 


Our second, need to evaluate (NE), is associated with individuals who tend to 
generate and retain their own attitudes 2004). This “self-monitoring” 
personality is also associated with the need to control (Snyder| and con- 
stantly evaluate social surroundings 1996). Such attributes in- 
duce greater reliance on intuition over outside sources. Past work has demon- 
strated that high NE individuals tend to make spontaneous judgements in re- 
sponse to stimuli (2001). This “on-line” form of information 
processing suggests that when people come into contact with outside informa- 
tion, their personality plays a key role in determining their levels of acceptance 
of the information. Resultant attitudes are much stronger than those in the al- 
ternative, “memory-based” processing (Bizer, Tormala, Rucker, & Petty} |2006). 
Other studies have also leveraged NE to explain information processing (Druck-| 


man & Nelson; |2003). For our context, we expect high NE individuals exhibit 
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greater reliance on their own intuition compared to other sources, which should 
lead to distrust. When a high NE individual is confronted by advice from an out- 
side source, we expect these individuals to be less trusting of advice, regardless 
of its origin. This is in line with recent work suggesting that errors experienced 
from an algorithm provoke a stronger distrust of that advice than do errors 


experienced from other sources (Dietvorst et al. |2015}. 


For measurement, we used the two-item battery for each personality type (NC 
and NE), totaling four questions for both personality types. Question wording is 
in the Appendix. Of note, though there is a tradeoff between “internal reliabil- 
ity and brevity” (2011), we opted for the 
smaller battery for two reasons. First, it is the exact same approach as using the 
common, reliable TIPI inventory to measure each of the Big 5 traits 
(2003). Both approaches uses two items per personality 
trait to generate a measure. Second, we wanted to ensure high response rates, 
given the inclusion of the personality batteries in addition to our main experi- 
ment. To minimize the burden on the respondent and with the “the benefit of 
being short enough to be included in large political surveys,” 
268), we opted for the smaller battery. Ultimately, we selected these measures 
of personality given their widespread use in a variety of fields including politi- 
cal science (Jost, Glaser, Kruglanski, & Sulloway] 2003), public policy (Sargent| 
2004), psychology (Cacioppo, Petty, & Feng Kao] 1984), and others (Luttrell, 
poi). 


Turning now to the Big 5, we use only the “agreeableness” and “openness 
to experience” traits in our study as they can be most clearly linked to trust in 
automation. We selected only two instead of all five traits, because, as 
note, “in most cases only some of the Big Five traits significantly 
predict outcomes of interest” (268). Our approach is similar to other studies on 
the role of the Big 5 in behavior that select only the specific personality traits 


that can be clearly linked to substantive phenomena (Quintelier| (2014). 
For agreeableness, (2011) and|John and Srivastava] (1999) note 


that “agreeableness contrasts a prosocial and communal orientation toward oth- 
ers with antagonism and includes traits such as altruism, tender-mindedness, 
trust, and modesty.” Agreeableness is also associated with social conformity 
and compliance 1981). In our con- 
text, being given advice from an “expert,” and then asked whether they wish 
to update their expectation, we expect agreeable individuals should positively 
respond to the advice-giver, regardless of the source of advice. In an effort to 
conform to the reigning wisdom via the advice treatment, individuals who are 
high on agreeableness should trust automation, positively weight expert advice, 
and also align with the advice-giver. 


Second, openness is associated with originality (Gerber et al. 


1999), intellectual curiosity (Peabody & Goldberg} |1989), and an 
eagerness to learn (Barrick & Mount} |1991). As individuals who are open to 


experiences come into contact with outside advice in an unfamiliar realm, they 
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should positively respond to the advice treatment across all three measures of 
trust discussed below. 

We leveraged the Ten-Item Personality Inventory (TIPI) (Gosling et al. 
to measure these traits. Two items containing personality adjectives are 
associated with each trait, with one phrase coded normally and the other re- 
verse coded (e.g., for “agreeableness”: item 7 = sympathetic, warm and item 2 
(reverse coded) = critical, quarrelsome). 


2 Method 


2.1 Participants 


We utilized Amazon’s Mechanical Turk (MTurk) to recruit 395 subjects, each 
of whom were paid $2.00 for participation. MTurk is a valid, widely used plat- 
form to field similar political, psychological, and social experiments such as ours 


(Clifford, Jewell, & Waggoner} |2015). Additional details of the study design are 


included in the Appendix. 


2.2 Procedure 


Our study contains observational (general trust) and experimental (behavioral 
impact) components. For the observational component, respondents were given 
an eight-item battery of questions related to degrees of trust in automation 
(2008). Respondents were asked their level of agreement 
on a scale from 1 (strongly disagree) to 7 (strongly agree) for statements like, 
“Using algorithms improves the output quality for organizations.” These were 
aggregated into a 7 point scale where 7 indicates high trust in algorithms, while 1 
indicates low trust. The wording for all of the items is available in the supporting 
information. This scale is the dependent variable for the first stage of the analysis, 
which is analyzed using OLS regression and presented in Table [I] 

For the experimental component, respondents were asked to forecast the 
probability of a defendant committing another crime within two years for one 
of eight real, randomly selected criminal profiles based on criminal history and 
defendant demographic characteristics. Then, the respondent was given “advice” 
from a source (listed below), and asked whether they wanted to update predic- 
tions or leave them the same (manual entry required both times). The shifts in 
respondents’ predictions (or lack thereof) is the quantity of interest in our study. 
We included two attention checks throughout to minimize satisficing 
(2016). Specifically, respondents were warned if they missed one atten- 
tion check, and then were removed and not paid if they failed both. About 80% 
of respondents who attempted the survey passed the checks and completed the 
survey. 

The presentation of our criminal profiles mimics the formatting of 


(2018), which was shown to be a sufficient amount of detail for an av- 


erage MTurk participant to make an informed judgment, with expected accuracy 
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similar to the popular “COMPAS” algorithm. The full wording is available in the 
supporting information. We randomly selected 20 pre-trial defendants from the 
2013-14 from Broward County, FL database, who all had a risk scores between 
2 and 8 (derived from the COMPAS algorithm, which ranked defendants from 
1 to 10, with 10 being the most likely to recidivate). This pool of defendants 
was winnowed when the crime involved was obscure, and then reduced again 
randomly to reduce the task burden on respondents, which left us with eight 
profiles. 


For each profile, respondents were randomly assigned to one of the three ad- 
vice conditions: judge with 10 years of experience in criminal sentencing; com- 
puter algorithm designed by computer scientists and criminal justice officials; 
average of a previous survey of 300 Turkers. The treatment conditions are coded 
as separate dummy variables for whether the individual saw advice from an algo- 
rithm or a judge in the scenario, with the previous MTurk survey as the baseline 
condition. And, in addition to the main personality predictors, we control for 
several common factors in public policy experiments, including age, education, 
gender, and partisanship. 


We evaluate two measures as trust. The first measure is “weight of advice” 
(Gino & Moore) por [Logg] [2016 . This variable is calculated as | uz; — uri | / | 
i — ui; |, where ug; is meted ts final assigned probability for recidivism, 
uy; is their initial prediction, and a; is the advice they were given from one of 
the sources. A score of 1 suggests the respondent only used the advice from 
the source, where as 0.5 suggests they weighted the source and their prediction 
equally, and 0 means the respondent ignored the advice. Our second measure is 
the average distance to advice, measured as | a;— uri |. Lower values indicate that 
there was less distance between the respondent’s final forecast and the advice 

they were given. 


We modeled the weight and distance measures by fitting multilevel regres- 
sions to the data after pooling across all criminal profiles and specifying varying 
intercepts for defendant descriptions and respondent. Multilevel models were 
chosen to account for unobserved heterogeneity on both the individual respon- 
dent and scenario level. This provides an efficient and accurate estimates for 
experiments where respondents evaluate multiple, different scenarios 


& Hilli |2006). The model was specified as 
Yigk = Gigk + Gj + Oe +B *X + eijk (1) 


where a;j is the overall intercept, ¢; ~ N(0,1) is the random intercept based on 
the defendant description, x ~ N(0,1) is the random intercept based on the 
individual respondent, 8 is an array of coefficients for the treatments X, and eij 
is the error term. Results are presented in Table [2] 
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3 Results 


For the observational analysis, note the significance and large magnitudes of 
effects for all personality indicators in the top four rows of Table High NE 
respondents are less likely to trust algorithms (8 = —0.14), compared to those 
higher on NC (6 = 0.25), agreeableness (8 = 0.04), and openness (8 = 0.07), all 
of whom are eager to learn. The latter group of respondents is more trusting of 
automation in line with expectations. 


Table 1. The Impact of Personality on Trust in Automation 


Dependent variable: 


Trust in Automation 


(1) (2) 
Need for Cognition 0.245*** (0.023) 
Need to Evaluate —0.136*** (0.021) 
Agreeableness 0.040** (0.016) 
Openness to Experience 0.066*** (0.015) 
Age 0.005*** (0.002)  0.007*** (0.002) 
Education 0.131*** (0.029) —0.020 (0.040) 
Female —0.212*** (0.035) —0.336*** (0.040) 
Partisanship —0.047*** (0.009) —0.028*** (0.010) 
Algorithm Condition  —0.000 (0.00000) —0.000 (0.00000) 
Judge Condition —0.000 (0.00000) 0.000 (0.00000) 
Constant 3.783°** (0.124) 4.047*** (0.173) 
N 3,022 2,233 
Log Likelihood 32,311.620 24,075.100 
Akaike Inf. Crit. —64,599.240 —48,126.190 
Bayesian Inf. Crit. —64,527.070 —48,057.660 


Note: *p<0.1; **p<0.05; ***p<0.01 


The experimental stage exploring the impact of personality on behavioral 
tasks seen in Table [2] and Figure [I] Here our N is higher because each respon- 
dent evaluated 8 defendant profiles. NC plays a strong conditioning role in the 
relative weight respondents’ assign to advice across both “expert” conditions in 
comparison to the baseline category. The degree to which NC conditions trust 
in algorithms is nearly doubled that of the judge condition (8 = 0.09 com- 
pared to 6 = 0.05). Further, the weight effect is opposite for NE individuals 
in the algorithm condition (8 = —0.04), and indistinguishable from zero in the 


1 Of note, the trust in automation index is measured at the individual-level, not the 
scenario-level. Hence the larger N in the tables, relative to the number of individual 
recruited subjects. 
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judge condition. Results are similar for high openness personality types in their 
weighting of algorithmic advice relative to humans. 


Table 2. The Impact of Personality on Behavior 


Dependent variable: 
Weight Distance Weight Distance 
(1) (2) (3) (4) 
Need for Cognition —0.047*** (0.018) 1.687* (0.892) 


Need to Evaluate 0.012 (0.017) —0.964 (0.839) 

Agreeableness 0.018 (0.013) —0.501 (0.642) 
Openness —0.017 (0.012) —0.126 (0.606) 
Age 0.0002 (0.001) 0.008 (0.043) 0.001 (0.001) —0.012 (0.051) 
Education 0.010 (0.019) —1.879** (0.857) 0.010 (0.022) —0.571 (1.008) 
Female 0.015 (0.020)  —0.678 (0.930) —0.001 (0.025) 0.758 (1.111) 
Partisanship 0.004 (0.005)  —0.187 (0.231) 0.005 (0.006) —0.128 (0.279) 
Algorithm Cond. —0.072 (0.079) —1.701 (4.101) 0.120 (0.097) —9.706* (5.081) 
Judge Cond. 0.006 (0.083)  —1.969 (4.307) —0.030 (0.097) 1.395 (5.144) 
Alg. x NC 0.093*** (0.024) —3.007** (1.251) 

Alg. x NE —0.035* (0.021) 1.905* (1.106) 

Judge x NC 0.054** (0.022) —1.929* (1.168) 

Judge x NE —0.033 (0.022) 1.645 (1.140) 

Alg. x Agreeable —0.024 (0.016) 0.610 (0.851) 
Alg. x Openness 0.028* (0.015) 0.228 (0.801) 
Judge x Agreeable 0.031* (0.016) —2.216** (0.876) 
Judge x Openness —0.005 (0.016) 1.275 (0.854) 
Constant 0.225** (0.093) 30.664*** (6.182) 0.076 (0.115) 32.606*** (7.002) 
N 3,022 3,022 2,233 2,233 

Log Likelihood —403.901 —12,702.400 —349.032 —9,388.203 
Akaike Inf. Crit. 839.802 25,436.800 730.063 18,808.410 
Bayesian Inf. Crit. 936.021 25,533.020 821.441 18,899.780 


*p<0.1; “*p<0.05; ***p<0.01 


Notably, NC strongly conditions trust in algorithms, but less so compared to 
advice from crowds or human experts. This is seen most clearly when comparing 
panels (a) and (b) in Figure [1] The gaps between the fit lines are more distinct 
in the algorithm condition (a) compared to the judge condition (b). There is 
only a modest distinction at the tails in the judge condition. 


Across all treatment conditions, the advice given by all three sources was the 
same, and we found no differences when we presented values centered around 
those derived from the COMPAS algorithm or when the advice was randomly 
chosen. 


Of note, for the weight and distance multilevel models, to test if there was 
any impact of varying the treatments by scenario or by respondent, we ran- 
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domly allocated half of our respondents to each type of assignment. We found 
no difference (i.e., no detection of study purpose) in the results. 


Conditional Effect of NC & Algorithm Condition Conditional Effect of NC & Judge Condition 

0.65 0.64 

0.44 0.44 
o o 
2 oO 
3 Algorithm 3 Judge 
3 Be 3 Hr 
ka = E = 
5 BE š = 
z = 

0.24 0.24 

0.04 0.04 

1 2 3 4 5 1 2 3 4 5 
Need for Cognition Need for Cognition 


Figure 1. Conditional Impacts of NfC on Behavior 


4 Discussion 


Overall, we found that personality influences trust in automation, as well as be- 
havioral tasks related to public policy decision-making. In the first stage, there 
was a pronounced impact of personality on general levels of trust. In line with 
research finding high levels of trust in algorithms 
2011), the significant conditioning role of these personality inventories suggests 
that personalities associated with intellectual curiosity, agreeableness, openness 
to advice-givers, as well as being highly aware of environments and more skepti- 
cal are strongly associated with levels of trust in automation. The former group 
comprised of individuals who are more accepting of new information and expe- 
riences is more trusting, while the latter group, who tends to be threatened by 
exogenous sources of information, is less trusting. 

Regarding changes in respondents’ behavioral indicators of trust, high NC in- 
dividuals are much more trusting of algorithms than of the wisdom of the crowd 
or, to a lesser extent, a human expert. And the NE personality trait, which we 
expected to be threatened by exogenous advice, weighs advice from algorithms 
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less than human advice sources. Surprisingly, no effects were observed for agree- 
able individuals in the algorithmic advice condition for weighting, though there 
were weak effects for judges. Strikingly, high NE and NC individuals reacted to 
algorithmic advice over all other advice sources, though the effect size is nearly 
doubled for NC individuals compared to NE individuals and is more statistically 
stable. We also saw a significant effect of personality conditioning behavior in 
decision making tasks, especially related to their trust in advice from an algo- 
rithm. 

Though a blend of significant and null results, we remain encouraged by 
our findings for two reasons. First, we uncovered strong evidence of personality 
influencing behavior and general levels of trust in automation, in line with our 
main goal. Given the newness of this topic, these results are useful for motivating 
future work on trust in automation and personality. Second, in line with|Gerber| 
fet al.] (2011), it would be unrealistic to expect all personality measures to explain 
all behavior. Of the Big 5 they note, “these traits have predictive power in an 
impressive variety of domains but are not universal predictors of all outcomes” 
(268). Our results corroborate this sentiment that personality plays a role in 
trust in automation, though it does not explain the breadth of general trust and 
behavior. 

Regarding generalizability, while people generally trust algorithmic advice 
relative to other advice sources, levels of trust are influenced by personality 
traits. As not all people retain the same personalities, not all people equally 
trust algorithms to make consequential decisions. 


5 Limitations and Future Directions 


While we offer a starting place for future work on personality and trust in 
automation, a key limitation of our study is focusing only on criminal jus- 
tice. Should we expect similar results in other subfields, such as automation in 
medicine, for example? Also, though |Dietvorst et al.] (2015) demonstrate trust in 
algorithms wanes when mistakes are introduced, this phenomenon may be more 
likely for high NE individuals relative to high NC individuals, given the starting 
place of skepticism for high NE individuals. Further, algorithm aversion may not 
be detectable for high NC individuals, while it may drive levels of trust for high 
NE individuals. Or, do the other three Big 5 personality traits (extroversion, 
conscientiousness, and emotional stability) impact trust in automation? In sum, 
we suggest researchers in this realm consider personalities to provide a fuller 
picture of trust in a variety of subfields. 

An additional limitation that may be addressed in future work is the nature 
of MTurk respondents in general, in that they are typically higher educated and 
more liberal for example 2015). 
and thus may be more likely to trust automation. Such a possibility suggests the 
potential for future and different samples to yield potentially different results. 
More analysis and experiments in this vein would deepen the impact of our initial 
findings in this research. 
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6 Concluding Remarks 


In this study, we have demonstrated that personality plays a strong role in im- 
pacting individuals’ levels of trust in automation as they make public policy 
decisions. We bring psychology into the trust in automation discussion for sev- 
eral reasons. First, such an approach offers a baseline for understanding the role 
of innate, heritable characteristics and their influence on trust in automation|?| 
Such an understanding makes it clearer where to look for greater or lesser trust 
in algorithms, and where the basis of trust lies. These psychological characteris- 
tics are also widely used in many fields to describe human behavior both inside 
and outside of 
public policy. Given the rapid increase of algorithms and algorithmic advice in 
everyday life [2016), the role of psychological characteristics conditioning 
virtually all human behavior (Eysenck\ |1963), and also the recent surge in re- 


search on algorithmic transparency (Rudin & Ustun||2018), our study offers a 


timely exploration of the intersection of trust in automation and personality. 
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Appendix 
A Support Information 


A.1 Task Wording 


It has generally been found that even untrained individuals can do very well, 
sometimes even better than trained people or computer algorithms, at deter- 
mining the likelihood of a person committing another crime after their initial 
arrest. 

We are interested in knowing whether this accuracy can be further improved 
by combining individual judgement with the advice of crowds, experts, or algo- 
rithms. In what follows, you will be given an actual arrest record for a person 
arrested in Broward County, Florida. We already know whether the person com- 
mitted another crime within the next two years. You will be asked to give us a 
probability of the person re-offending along the following lines. 

We have collected advice from several sources: 


— Several Mechanical Turk surveys of people like yourself. 

— A judge with over 10 years of experience. 

— A machine learning algorithms, developed by computer scientists and crim- 
inal justice experts, that use historic recidivism data to predict probability 
of re-offending. 


Warning: There are attention checks in this survey. We reserve 
the right to deny payment if a participant fails these checks, as that 
indicates the participant is not actually doing the tasks. 


A.2 Defendant Profiles 


The defendant is a male aged 22. They have been charged with: Possession of 
Cocaine. This crime is classified as a felony. They have been convicted of 0 prior 
crimes. They have 0 juvenile felony charges and 0 juvenile misdemeanor charges 
on their record. 

The defendant is a male aged 38. They have been charged with: Manufac- 
turing Cannabis/Marijuana. This crime is classified as a felony. They have been 
convicted of 3 prior crimes. They have 0 juvenile felony charges and 0 juvenile 
misdemeanor charges on their record. 

The defendant is a male aged 23. They have been charged with: Grand Theft. 
This crime is classified as a felony. They have been convicted of 3 prior crimes. 
They have 0 juvenile felony charges and 0 juvenile misdemeanor charges on their 
record. 
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The defendant is a male aged 27. They have been charged with: Possession 
of Meth. This crime is classified as a felony. They have been convicted of 5 prior 
crimes. They have 0 juvenile felony charges and 0 juvenile misdemeanor charges 
on their record. 

The defendant is a male aged 24. They have been charged with: Driving with 
a Revoked License. This crime is classified as a felony. They have been convicted 
of 2 prior crimes. They have 0 juvenile felony charges and 0 juvenile misdemeanor 
charges on their record. 

The defendant is a female aged 33. They have been charged with: Child 
Neglect. This crime is classified as a felony. They have been convicted of 1 prior 
crimes. They have 0 juvenile felony charges and 0 juvenile misdemeanor charges 
on their record. 

The defendant is a male aged 22. They have been charged with: Disorderly 
Conduct. This crime is classified as a misdemeanor. They have been convicted of 
0 prior crimes. They have 0 juvenile felony charges and 0 juvenile misdemeanor 
charges on their record. 

The defendant is a male aged 24. They have been charged with: Resisting 
an Officer with Violence. This crime is classified as a felony. They have been 
convicted of 0 prior crimes. They have 0 juvenile felony charges and 0 juvenile 
misdemeanor charges on their record. 


A.3 Examples of Treatment 


A group of 200 people recruited from Mechanical Turk, on average rated the 
defendant as 80% likely to commit another felony crime within the next two 
years. 

Previously, you forecast that the defendant was [RESPONDENT’S PREVI- 
OUS FORECAST] likely to commit another felony crime within the next two 
years. 

If you would like to update your forecast, you can do so now. If 
not, just enter the same numbers as you entered previously. 

A judge with more than 10 years of experience rated the defendant as 80% 
likely to commit another felony crime within the next two years. 

Previously, you forecast that the defendant was [RESPONDENT’S PREVI- 
OUS FORECAST] likely to commit another felony crime within the next two 
years. 

If you would like to update your forecast, you can do so now. If 
not, just enter the same numbers as you entered previously. 

An algorithm developed by computer scientists and criminal justice researchers, 
based on a statistical analysis of thousands of past defendant records, rated the 
defendant as 80% likely to commit another felony crime within the next two 
years. 

Previously, you forecast that the defendant was [RESPONDENT’S PREVI- 
OUS FORECAST] likely to commit another felony crime within the next two 
years. 
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If you would like to update your forecast, you can do so now. If 
not, just enter the same numbers as you entered previously. 


A.4 MTurk Study Specifics 


In addition to the specifics of the design included in the manuscript, below are 
some additional specific items related to fielding the study on MTurk: 


1. Approval Rate: HIT Approval Rate > 95% 

2. Location: United States 

3. Study Description: Respondents will be asked to evaluate a series of real 
criminal profiles and asked to predict the likelihood of recidivism with and 
without the help of advice. 

4. Keywords: survey, criminal justice, forecasting, predication 


A.5 Personality Inventories 


A.5.1 NC and NE 

Please indicate the extent to which these statements are characteristic or un- 
characteristic of you (On a scale from 1 to 5, with being extremely characteristic 
and 5 being extremely uncharacteristic). 


1. I have opinions about almost everything. 

2. I like having responsibility for handling situations that require a lot of think- 
ing. 

It is very important to me to hold strong opinions. 

4. I often prefer to remain neutral about complex issues. 


= 


A.5.2 TIPI for Big 5 

Here are a number of personality traits that may or may not apply to you. Please 
indicate the extent to which you agree or disagree that these characteristics apply 
to you. You should rate the extent to which the pair of traits applies to you, 
even if one characteristic applies more strongly than the other. (On a scale from 
1 to 7, with 1 = strongly disagree and 7 = strongly agree). 


Extroverted, enthusiastic 

Critical, quarrelsome 

Dependable, self-disciplined 
Anxious, easily upset 

Open to new experiences, complex 
Reserved, quiet 

Sympathetic, warm 

Disorganized, careless 

Calm, emotionally stable 
Conventional, uncreative 


SE: OO OPN a 2 Bee 


= 
> 
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A.6 Trust in Automation Index 


Many organizations now use algorithms to make forecasts. Some high profile 
examples include the use of statistics in baseball to choose players (Moneyball) 
or Nate Silvers use of statistics to predict elections. To what extent do you agree 
or disagree with the following statements about algorithms? (On a scale from 
1 to 7, with 1 = strongly agree and 7 = strongly disagree). Given the variance 
in valence, items were coded so that the highest end of the response range (7) 
indicates high trust in automation and the lowest end of the range (1) indicates 
low trust. 


1. 
2. 


sing algorithms increases the chances of organizations achieving their goals. 

sing algorithms increases the effectiveness of organizations in making good 

ecisions. 

sing algorithms improves the output quality for organizations. 

sing algorithms makes it more likely for organizations to make errors. 

5. Modern organizations rely too much on algorithms to make decisions about 

he future. 

6. Using algorithms is an effective way to overcome human biases. 

7. When I am uncertain about something, I will trust the information from an 
algorithm in place of my own judgement. 

8. When I am uncertain about something, I will tend to trust my own intuition 
and judgement over the information from an algorithm. 


oe 
acade 


gaje 


A.7 Base Relationships: Empirical Motivation 


As an empirical motivation for our full study, we offer a short discussion of our 
base findings of relative influence of the treatment conditions in the experiment. 
We show the impact of advice from an algorithm or a judge relative to the base- 
line category of average past MTurk respondents for our two behavioral measures 
of trust in Table |3} advice weight and distance to advice. The strong positive 
impacts from the first model (column 1) for each condition suggest respondents 
are reacting to the advice, with the magnitude of the effect in the algorithm con- 
dition nearly twice that of the judge condition. Second, the pronounced negative 
effects in the second model (column 2) demonstrate the impact of the algorithm 
and judge treatments on reducing the distance between respondents’ predictions 
and the advice-giver relative to the baseline category. Similarly, the effects are 
nearly doubled in the algorithm condition. 

These results demonstrate two things. First, respondents were significantly 
more likely to change their evaluations based on the advice of “experts,” whether 
human or machine-derived than they were to trust the “wisdom of the crowd.” 
And second the algorithm condition is where the strongest effects are observed, 
suggesting respondents trust algorithms to a greater degree than advice from 
humans. This is an important finding by itself, and one we explore in greater 
detail in a separate paper. 
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Table 3. Experimental Impacts on Dependent Variables of Interest 


Dependent variable: 


Advice Weight Distance to Advice 


(1) (2) 

Algorithm Condition 0.134*** (0.013) —6.563*** (0.864) 
Judge Condition 0.073*** (0.013) —3.812*** (0.857) 
Constant 0.156*** (0.009)  27.353*** (0.608) 
Observations 3,274 3,274 

R? 0.031 0.017 
Adjusted R? 0.030 0.017 
Residual Std. Error (df = 3271) 0.305 20.113 

F Statistic (df = 2; 3271) 52.076*** 29.117*** 


*p<0.1; **p<0.05; ***p<0.01 


